{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 案例2：使用 Embedding 进行知识搜索\n",
    "\n",
    "## 使用嵌入的文本搜索相似度最高的评论内容 Get_embeddings_from_dataset.ipynb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from ast import literal_eval\n",
    "\n",
    "datafile_path = \"data/fine_food_reviews_with_embeddings_1k.csv\"\n",
    "#1.读取知识库中的embedding列\n",
    "df = pd.read_csv(datafile_path)\n",
    "df[\"embedding\"] = df.embedding.apply(literal_eval).apply(np.array)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Delicious!:  I enjoy this white beans seasoning, it gives a rich flavor to the beans I just love it, my mother in law didn't know about this Zatarain's brand and now she is traying different seasoning\n",
      "\n",
      "Fantastic Instant Refried beans:  Fantastic Instant Refried Beans have been a staple for my family now for nearly 20 years.  All 7 of us love it and my grown kids are passing on the tradition.\n",
      "\n",
      "Delicious:  While there may be better coffee beans available, this is my first purchase and my first time grinding my own beans.  I read several reviews before purchasing this brand, and am extremely \n",
      "\n"
     ]
    }
   ],
   "source": [
    "from utils.embedings_utils import get_embedding,cosine_similarity\n",
    "\n",
    "#2.定义函数，根据字符，查找评论中相似度最高的3条记录\n",
    "def search_reviews(df, product_description, n=3, pprint=True):\n",
    "    #1.将查询的字符串，调用embedding模型，进行embedding化\n",
    "    product_embedding = get_embedding(\n",
    "        product_description,\n",
    "        model=\"text-embedding-3-small\"\n",
    "    )\n",
    "    #2.根据查询字符串的embedding值，与embedding列，进行相似度计算,放入到新增列similarity\n",
    "    df[\"similarity\"] = df.embedding.apply(lambda x: cosine_similarity(x, product_embedding))\n",
    "\n",
    "    #3.对similarity列进行排序，取出前3条,输出标题和内容\n",
    "    results = (\n",
    "        df.sort_values(\"similarity\", ascending=False)\n",
    "        .head(n)\n",
    "        .combined.str.replace(\"Title: \", \"\")\n",
    "        .str.replace(\"; Content:\", \": \")\n",
    "    )\n",
    "    if pprint:\n",
    "        for r in results:\n",
    "            print(r[:200])\n",
    "            print()\n",
    "    return results\n",
    "\n",
    "#3.调用测试\n",
    "results = search_reviews(df, \"delicious beans\", n=3)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tasty and Quick Pasta:  Barilla Whole Grain Fusilli with Vegetable Marinara is tasty and has an excellent chunky vegetable marinara.  I just wish there was more of it.  If you aren't starving or on a \n",
      "\n",
      "sooo good:  tastes so good. Worth the money. My boyfriend hates wheat pasta and LOVES this. cooks fast tastes great.I love this brand and started buying more of their pastas. Bulk is best.\n",
      "\n",
      "Bland and vaguely gamy tasting, skip this one:  As far as prepared dinner kits go, \"Barilla Whole Grain Mezze Penne with Tomato and Basil Sauce\" just did not do it for me...and this is coming from a p\n",
      "\n"
     ]
    }
   ],
   "source": [
    "results = search_reviews(df, \"whole wheat pasta\", n=3)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can search through these reviews easily. To speed up computation, we can use a special algorithm, aimed at faster search through embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "great product, poor delivery:  The coffee is excellent and I am a repeat buyer.  Problem this time was with the UPS delivery.  They left the box in front of my garage door in the middle of the drivewa\n",
      "\n"
     ]
    }
   ],
   "source": [
    "results = search_reviews(df, \"bad delivery\", n=1)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, this can immediately deliver a lot of value. In this example we show being able to quickly find the examples of delivery failures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Disappointed:  The metal cover has severely disformed. And most of the cookies inside have been crushed into small pieces. Shopping experience is awful. I'll never buy it online again.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "results = search_reviews(df, \"spoilt\", n=1)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Great food!:  I wanted a food for a a dog with skin problems. His skin greatly improved with the switch, though he still itches some.  He loves the food. No recalls, American made with American ingred\n",
      "\n",
      "Great food!:  I wanted a food for a a dog with skin problems. His skin greatly improved with the switch, though he still itches some.  He loves the food. No recalls, American made with American ingred\n",
      "\n"
     ]
    }
   ],
   "source": [
    "results = search_reviews(df, \"pet food\", n=2)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "openai",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
